Ensemble learning based compressive strength prediction of concrete structures through real-time non-destructive testing

This study conducts an extensive comparative analysis of computational intelligence approaches aimed at predicting the compressive strength (CS) of concrete, utilizing two non-destructive testing (NDT) methods: the rebound hammer (RH) and the ultrasonic pulse velocity (UPV) test. In the ensemble learning approach, the six most popular algorithms (Adaboost, CatBoost, gradient boosting tree (GBT), random forest (RF), stacking, and extreme gradient boosting (XGB)) have been used to develop the prediction models of CS of concrete based on NDT. The ML models have been developed using a total of 721 samples, of which 111 were cast in the laboratory, 134 were obtained from in-situ testing, and the other samples were gathered from the literature. Among the three categories of analytical models—RH models, UPV models, and combined RH and UPV models; seven, ten, and thirteen models have been used respectively. AdaBoost, CatBoost, GBT, RF, Stacking, and XGB models have been used to improve the accuracy and dependability of the analytical models. The RH-M5, UPV-M6, and C-M6 (combined UPV and RH model) models were found with highest performance level amongst all the analytical models. The MAPE value of XGB was observed to be 84.37%, 83.24%, 77.33%, 59.46%, and 81.08% lower than AdaBoost, CatBoost, GBT, RF, and stacking, respectively. The performance of XGB model has been found best than other soft computing techniques and existing traditional predictive models.


Non-destructive testing
NDT refers to a range of analysis techniques used in engineering and/or science to assess the properties, integrity, and characteristics of materials, components, or structures without causing any damage or alteration to their physical properties.NDT methods are used to examine and evaluate flaws, defects, or anomalies in a non-invasive manner, ensuring the reliability, safety, and quality of the examined materials or structures 9 .The commonly used NDT testing to estimate the CS of concrete is described below:

Rebound hammer
The RH test is one of the most widely used NDT technique for determining the CS of concrete which offers a practical and reasonably priced method to determine the concrete CS.The RH test standards are provided by various nations like India, USA, China, UK, Russia, European Union, Switzerland, and Japan, as shown in Fig. 1.
The concept behind the hardness test is that an elastic mass's rebound is influenced by how hard the surface is that it impacts.The strength of concrete is inversely correlated with the amount of energy it can absorb.The method of testing starts with carefully choosing and preparing the concrete surface that will be tested.Abrasive stones should be used to smooth up the test surface after the surface has been chosen.To impart a specific amount of energy, the hammer is then driven on the test surface.
Let the plunger make a perpendicular stroke to the surface.In the old RH, the inclination angle of the hammer has an impact on the results, but it is unimportant in the latest RH instruments.The rebound number should be recorded after the impact 10,11 .A minimum of ten readings must be taken in each area being analysed.Although there is no unique relationship between concrete hardness and strength.However, according to IS 13311 Part 2 12 , the rebound number is affected by factors such as cement type, aggregate type, carbonation of concrete, surface condition, concrete age, concrete moisture content, curing time, etc.

Ultrasonic pulse velocity (UPV)
This method involves measuring the velocity of an ultrasonic wave propagating through a specimen to evaluate its strength and quality characteristics.A complicated system of stress waves is created as a result, including longitudinal (compressional), shear (transverse), and surface (Rayleigh) waves.The longitudinal waves, which move the quickest, are detected by the receiving transducer.The velocity of the ultrasonic wave can be used as a metric to grade the quality of the concrete, with higher velocities indicating better quality and homogeneity, and lower velocities indicating non-uniformity or the presence of defects such as cracks or voids.
In order to conduct this test, an ultrasonic wave pulse is introduced into the material under examination, and the elapsed time for the pulse to traverse the material is meticulously recorded.Subsequently, the pulse velocity is computed by dividing the distance, the pulse travelled within the material by the time it took for this traversal.Notably, the velocity of the ultrasonic wave is influenced by the density and elastic modulus of the material.There are various standard methods used globally to conduct the UPV test, as shown in Fig. 1.UPV testing methods can be categorized into three groups: direct testing, semi-direct testing, and indirect testing, as presented in Fig. 2. According to IS 13311 Part 1 13 , factors that can influence the pulse velocity includes the surface conditions and moisture present in the concrete, the shape, and size of the concrete member, the temperature of the concrete, the presence of stress, the effect of reinforcing bars, etc.It is important to consider these factors to obtain accurate results.The pulse velocity (V) is given by: where, V, L, and T are the pulse velocity, length, and effective time, respectively.
The velocity criterion for concrete quality grading according to IS 13311 Part 1 13 is shown in Table 2, and concrete quality classification based on the RH and UPV values is shown in Table 3 14 .

Materials
OPC 43 grade cement was used in the present research work.The physical properties of Ordinary Portland Cement such as consistency, fineness, specific gravity, and CS after 672 ± 4 h have values of 30%, 310 m 2 /kg, 3.14, and 47.31 MPa, respectively.The coarse aggregates used in this study, measuring 20 mm and 10 mm in size, were naturally crushed and have corresponding fineness modulus of 2.25 and specific gravity of 2.71.Fine aggregates were natural with specific and fineness modulus values of 2.69 and 2.78 (Zone III), respectively.The design mix was prepared according to IS 10262: 2016 15 .Test specimens were batched onsite using different concrete mix designs with cement, water, coarse aggregate, fine aggregate, admixture, and w/c ratio, with nominal 28 days CS of 22 MPa to 44 MPa and concrete with slump flow of more than 100 mm.The total number of cast samples was 111, and each cube sample measured 150 mm × 150 mm × 150 mm. Figure 3 depicts the complete procedure of the samples from the casting phase to the testing phase.

CS using RH
The CS of concrete can be quickly and conveniently determined using the NDT method known as the "RH test".The RH, often referred to as a Schmidt hammer, is composed of a mass that is moved along a plunger inside a tubular casing and is controlled by a spring.Before testing, all samples were taken out of the curing tank and maintained in the lab environment for roughly 24 h.Then 15 mm small circles were marked on the two faces of the concrete cube.The arrangement of the circle with the centre-to-centre spacing is presented in Fig. 4a.The total number of marking were twenty-five as shown in Fig. 4b.The concrete cube specimens were placed in a compression testing machine and subjected to a constant load of approximately 7 N/mm 2 (based on impact energy of the hammer) (Fig. 4c).The rebound number was measured, and the CS is calculated according to IS

Concrete quality Excellent Good Medium Doubtful
UPV value (km/s) > 4.5 3.5 to 4.5 3.0 to 3.5 < 3.0   516:1959 16 .In the RH test, 10 to 12 rebound number responses from each test location were measured on two faces of the cube specimen (Fig. 4d).
The average rebound value of each test region is derived using an algorithmic average after the maximum set and the minimum set of results 17 have been removed.
where, R m and RN i are the average value of each test area and the measured value of each impact point, respectively.

Density using UPV
The working principle and related details of UPV are already mentioned in the subsection "Ultrasonic pulse velocity (UPV)".In the UPV test, marking has been done on two faces of the concrete cube.The diameter of the probe is 50 mm and on each face, five markings were made as shown in Fig. 5a and b.On an average only two readings were taken from the selected two faces as shown in Fig. 5d.The 54 kHz probes were used and the method of testing is a direct method or direct testing as shown in Fig. 5c.
The average UPV value can be obtained by using below expression: where, UPV m and UPV i are the average value of each test area and the measured value of each point, sequentially.

CS using UTM
After the RH and UPV testing, the sample had been tested under the universal testing machine (UTM) according to IS 516: 1959 16 .Load should be applied gradually and continuously without shock.Gradually increase the load until the cube either reaches its peak capacity or shows signs of cracking.The maximum load at which the cube fails should be recorded along with the type of failure (crushing, splitting, etc.) as shown in Fig. 6.IS 516 code provides guidelines for the testing of concrete cubes and the procedure is followed to obtain accurate results.

Collected database
To collect the RH and UPV data of concrete cubes, a thorough research of the literature was conducted.From the published studies, 476 datasets had been gathered 5,[18][19][20][21] .From in-situ NDT, 134 datasets had been obtained.Furthermore, a total of 111 concrete cubes were cast and subsequently subjected to laboratory testing employing methods such as RH, UPV, and UTM.In the end, 721 datasets were chosen to construct the ML models.Figure 7 shows the full approach used to accomplish the goal of this study.Table 4 presents the statistical characteristics (minimum, maximum, mean, standard deviation (SD) and kurtosis (K u )) of the gathered, in-situ, laboratory tests, and the merged dataset.
A probability histogram is a graph that lists all possible outcomes along the x-axis and the likelihood of each outcome on the y-axis.The probability distribution is depicted graphically in Fig. 8.

Processing of data
Data processing is an important step in ML algorithms.Data standardization is the process of transforming data into a common format or scale so that it can be easily compared and combined with other data.This is particularly important when working with data from different sources, as each source may have its format or scale.In this work, "Min-Max Scaling" has been used to normalize the input and output datasets.Standardization/ normalization of the dataset is important because it allows for more accurate and fair comparisons between data and can improve the performance of ML algorithms 22,23 .

Prediction models
To predict the CS using NDT methods; mathematical and ML models have been applied.The mathematical models have been divided into three categories: (a) prediction models based on RH, (b) prediction models based on UPV value, and (c) prediction models based on combined RH and UPV values.Ensemble-based ML algorithms namely AdaBoost, CatBoost, GBT, RF, stacking, and XGB have been applied to develop the CS prediction models.Detailed information on the mathematical and ML models is available in subsequent sections.

Mathematical models
In this section, the details of analytical models based on RH, UPV, and combined RH and UPV are given in Tables 5, 6 and 7, sequentially with the year of publication.These models are the most widely acknowledged empirical connections for calculating the CS of concrete using NDT techniques that can be found in the literature.These equations rely on UPV, RH, or a combination of both for measurements, but they tend to demonstrate considerable deviation, leading to predicted results that significantly differ from the actual values.The CS of concrete has been adjusted using a tiny correction in the analytical models (UPV-M8, CM-10, and CM-13).In the UPV-M8, C-M10, and C-M13 models, the correction factor in the existing model is divided by 10000, 10, and 10 values, respectively.

ML models
EL combines diverse models: bagging (bootstrap aggregating) and boosting.Bagging trains varied model instances on different data subsets, combining predictions through voting or averaging.Boosting trains sequential weak models, each correcting the previous one's errors, combining predictions with weighted emphasis.Models like random forest (RF), gradient boosting trees (GBT), AdaBoost, and XGBoost (XGB) are prominent for their superior performance and lower overfitting risks.This study applied six EL models to enhance accuracy of the existing model, showcasing EL's capability to improve predictive results.The overview of the ML models is given in Table 8.

Model validation
There are several metrics commonly used for model validation in regression problems, such as correlation coefficient (R), root mean square error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE).The R-value closes to one shows the better fit of the model and lower RMSE (approaches to zero) indicates a better fit.Nash-Sutcliffe (NS) efficiency index with a value equal to one shows a good fit between the experimental and predicted values.A higher a20-index value indicates a better fit.It is always preferable to use multiple metrics to evaluate the performance of the individual model [54][55][56][57][58][59][60][61][62][63] .

Mathematical models
The mathematical models are divided into three categories namely: (i) RH models (ii) UPV models and (iii) combined RH and UPV models as mentioned in section "Prediction models".In RH models, the R-value of the RH-M2 model is the highest among all the models (RH models The values of all performance metrics is shown in Table 9 and the scatter plot of all the RH models is shown in Fig. 9. Therefore, based on all the performance indices, it can be summarized that the precision of the RH-M5 model is good among all the RH models.
Among UPV models, the UPV-M7 model exhibits a higher R-value compared to other UPV models.However, the errors in the UPV-M7 model are higher as compared to the UPV-M6 model.The MAE value of the UPV-M6 model is 20.48% lower than the UPV-M7 model.Similarly, the RMSE and MAPE value of the UPV-M6 model is 22.71% and 15.62% lower than UPV-M7.However, the NS and a20-index of the UPV-M8 and UPV-M2 models are greater.
However, the overall performance of the UPV-M6 model is good.Therefore, it can be inferred that the UPV-M6 model outperforms other UPV models in terms of performance.A scatter plot and the values for all performance metrics are presented in Fig. 10 and Table 10 respectively.
Overall, the C-M6 model demonstrates superior performance in comparison to combined models as well as the other two categories.Figure 11

ML models
In this study, six ML models were developed and evaluated based on six distinct performance metrics.Along with these metrics, scatter plots, absolute error plots, and grouped marginal plots have also been utilized to display the accuracy and errors of the models.To compare the analytical models and the ML models, a raincloud graphic and Taylor diagram have been employed.The AdaBoost model exhibits an R-value of 0.9280, followed by a MAPE value of 10.81%, with the NS and a20-index values being 0.8610 and 0.8724, respectively.In comparison, the CatBoost, GBT, RF, stacking, and XGB models display correlation coefficients of 0.9349, 0.9627, 0.9877, 0.9536, and 0.9970, respectively.Among all ML models, the XGB model outperforms the others in terms of R-value, with a 7.44%, 6.64%, 3.56%, 0.94%, and 4.55% higher score than AdaBoost, CatBoost, GBT, RF, and stacking, respectively.Similarly, the NS and a20-index values of the XGB model are higher than all other developed ML models.Additionally, the MAPE and MAE values of the XGB model are significantly lower than AdaBoost, CatBoost, GBT, RF, and stacking models, with reductions of 84.37%, 832.4%, 77.33%, 59.46%, and 81.08% for MAPE, and 84.99%, 83.43%, 77.60%, 60.37%, and 82.31% for MAE, respectively.Based on all performance metrics, the XGB model exhibits good precision.The graphical representation of all the developed models are shown in Fig. 12a-f.Table 12 displays the values of all performance metrics for the developed models (ML models).In these figures, three plots are provided.The first plot shows the scatter plot of the training and testing dataset.The second plot is the grouped marginal plot, which combines a scatter plot with density curves along the margins to represent the distribution of multiple variables in a single plot.In the last plot, the absolute error values of the training and testing datasets is shown.In a grouped marginal plot, each group of observations is represented by a different colour (red colour is for experimental values and blue colour is for predicted values) and the density curves along the margins show the distribution of each variable for each group.This plot helps to visualize the relationship between two variables and the distribution of each variable for different groups in the data.

Comparison between mathematical and ML models
The performance of the analytical and developed ML models had been compared with existing ML models.
In addition to that, the a20-index of the XGB model is also higher than all the analytical as well as existing ML models.The MAPE value of the XGB model is 75.03%,86.78%, 91.20%, 93.78%, and 88.67% lower than Shih et al. 6 , Asteris et al. 8 , RH-M5, UPV-M6 and C-M6, sequentially.In nutshell, the XGB model demonstrates superior performance as compared to both existing ML and analytical models.Taylor diagrams and raincloud plots have been used to compare how well the ML and analytical models performed.Taylor diagram is drawn between the R-value, RMSE value, and standard deviation.The dark dotted blue line represents the standard deviation of the experimental dataset and green star shows the position of the best-fitted model.Figure 13a represents the Taylor plot of RH models.In Fig. 13a, not a single model shows good fitting, because the RMSE value of all the models are above 6 MPa.  Figure 13b represents the Taylor plot of UPV models.The RMSE value of all the models above 9 MPa in Fig. 13b does not exhibit any good fitting.Figure 13c and d represent the Taylor plot of combined RH and UPV models.Only two models (C-M5 and C-M6) in Fig. 13c have RMSE values lower than six MPa.The alignment of analytical models, specifically RH-M1, RH-M2, RH-M7, UPV-M1, UPV-M4, C-M1, C-M2, C-M7, C-M8, C-M10, and C-M13, exhibits deviations from ideal placement within the Taylor plot.This deviation can be attributed to the pronounced disparity observed in the standard deviation values.The Taylor plot of the developed ML models is shown in Fig. 13e.The RMSE value of the ML models such as GBT, RF, stacking, and XGB models is less than three MPa and among these models, the XGB model shows the best fit. Figure 13f shows the Taylor plot of the best-selected analytical models (RH-M5, UPV-M6, and C-M6) and best-fitted ML model (XGB).
In addition to that, the raincloud plot has also been used to compare the performance of the selected analytical models (RH-M5, UPV-M6, and C-M6) and all ML models as shown in Fig. 14 14.This plot also indicate that the performance of the XGB model is higher as compared to all analytical as well as developed ML models.

Sensitivity analysis
Sensitivity analysis is a technique to determine how the uncertainty in the output of a model can be attributed to variations in its inputs.SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any ML model.It uses Shapley values, a well-established mathematical concept from cooperative game theory, to explain the output of a model by assigning a contribution to each feature 64 .
For an XGB algorithm, SHAP values can be used to perform sensitivity analysis by calculating the influence of each feature to the model's predictions.Observing the magnitude and direction of the SHAP values associated with each feature enables the identification of the most influential features affecting the model's predictions.Understanding these values, helps ascertain how altering feature values will affect the predictions of the developed model.This information can be useful for interpreting the model's behaviour and for making decisions about feature selection and model interpretation.RH value has the highest impact on the CS of concrete as compared to the UPV value.The RH value has an 88.19% influence on the CS of concrete and the rest is contributed by UPV values as shown in Fig. 15.

Conclusions
The compressive strength of concrete based on the NDT technique has been evaluated in the present study using analytical as well as the ML models.The three groups of analytical models-RH models, UPV models, and combined RH and UPV models consist of seven, ten, and thirteen models, respectively.The Ensemble-based ML algorithms (AdaBoost, CatBoost, GBT, RF, Stacking, and XGB models) have been used to enhance the accuracy of the existing models.The six performance metrics were employed to evaluate the accuracy of both analytical and ML models.Furthermore, graphical representations such as scatter plots, absolute error plots, and grouped marginal plots were utilized to analyse the fitting of the ML models.In addition to that, Taylor and raincloud plots have also been also used to compare the performance of the selected analytical and the developed ML models.Based on the performance metrics and graphical representations, the following conclusions can be drawn: • In selected analytical models, the correlation coefficient of the RH model (RH-M5), UPV model (UPV-M6), and combined RH and UPV model (C-M6) are 0.7931, 0.5788, and 0.8967, sequentially.Similarly, the NS and a20-index of the C-M6 model are higher than RH and UPV models with values of 0.9565 and 0.6602, sequentially.• The performance of all the developed ML models is higher than existing analytical models.Among ML models, the precision of the XGB model is higher in terms of R, RMSE, MAPE, and MAE values.• The R-value of the XGB model is 25.71%, 72.25%, and 11.19% higher than RH-M5, UPV-ML, and C-M6 models, sequentially.

Figure 1 .
Figure 1.Standards of RH and UPV testing.

Figure 4 .
Figure 4. Concrete cubes (a) Marking of 15 mm circle for RH test (b) Assigning number to each circle and (c) Impression of RH test.

Figure 5 .
Figure 5. Concrete cubes (a) Marking of 50 mm circle for UPV test (b) Assigning number to each circle (c) testing of samples with UPV instrument, and (d) Impression of UPV test.

Figure 6 . 1 .Figure 7 .
Figure 6.Testing of the specimen under UTM (a) setting of the specimen under UTM, (b) application of load, and (c) failure of the specimen.

Figure 8 .
Figure 8. Histogram probabilities plot (a) data from the literature, (b) in-situ data, (c) laboratory data, and (d) combined all data.

Figure 14 .
Figure 14.Raincloud plot to show the error comparison of best analytical models and developed ML models.

Table 1 .
Summary of previously established ML models to determine the CS using RH and UPV values.

Table 2 .
Velocity criterion for concrete quality grading.

Table 3 .
Concrete quality classification based on RH and UPV.

Table 4 .
Statistical parameters of all the datasets used to develop ML models.

Table 5 .
List of analytical models based on RH.

Table 6 .
List of analytical models based on UPV.

Table 7 .
List of analytical models based on combined RH and UPV.

Table 8 .
and Table 11 depict a sequential representation of the scatter plot and performance metric values.Additionally, the C-M6 model exhibits the lowest MAE, RMSE, and MAPE values among all combined models.Description of ML models.

Table 9 .
Results of RH models.

Table 11 .
Results of combined RH and UPV models.
further confirms the high precision of the XGB model.Based on performance indices and graphical analysis, the ranking of all ML models is in descending order as follows: XGB, RF, GBT, stacking, CatBoost, and AdaBoost.

Table 12 .
Results of ML models.

Table 13 .
Comparison of existing ML models with best-fitted analytical and ML models.